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Abstract: Regular tree grammars and regular path expressions constitute 
core constructs widely used in programming languages and type systems. Nev- 
ertheless, there has been little research so far on reasoning frameworks for path 
expressions where node cardinality constraints occur along a path in a tree. 
We present a logic capable of expressing deep counting along paths which may 
include arbitrary recursive forward and backward navigation. The counting 
extensions can be seen as a generalization of graded modalities that count im- 
mediate successor nodes. While the combination of graded modalities, nominals, 
and inverse modalities yields undecidable logics over graphs, we show that these 
features can be combined in a tree logic decidable in exponential time. 
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Resume : Ce document introduit unc logiquc d'arbre decidable en temps 
exponentielle et qui est capable d'exprimer des contraintes de cardinalitc sur 
chemins multidirectionnelle 
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1 Introduction 

A fundamental peculiarity of XML is the description of regular properties. For 
example, in XML schema languages the content types of element definitions is 
made through the use of regular expressions. In addition, selecting nodes in 
such constrained trees is also done by the mean of regular path expressions (a 
la XPath). In both cases, it is often interesting to be able to express conditions 
on the frequency of occurrences of nodes. 

Even if we consider simple strings, it is well known that some formal lan- 
guages easily described in English may require voluminous regular expressions. 
For instance, as pointed in [HJJ^QS] . the language L2a2h of all strings over 
E = {a, b, c} containing at least two occurrences of a and at least two occur- 
rences of b seems to require a large expression, such as: 

j:*ai:*aY.*b^*b^* u E*aS*6E*aE*6E* 

u E*aS*6S*&E*aE* u S*&E*6E*aE*aE* 

u E*6E*aS*&E*aE* u E*6E*aS*aE*6E*. 

If we added n to the operators for forming regular expressions, then the language 
{a,b,c} could be expressed more concisely as (E*aS*aE*) n (S*6E*6E*). In 
logical terms, conjunction offers a first dramatic reduction in expression size. 

If we now consider a formalism equipped with the ability of describing nu- 
merical constraints on the frequency of occurrences, we get a second (exponen- 
tial) reduction in size. For instance, the above expression can be formulated as 
(E*aE*)2n(E*6E*)2. We can even write (E*aE*)2'" n (E*6E*)2"' instead of a 
(much) larger expression. 

Different extensions of regular expressions with intersection, counting con- 
straints, and interleaving have been recently considered over strings, and for 
describing content models of sibling nodes in XML type languages jCGS09[ 
[GMN08 , KT07] . The complexity of the inclusion problem over these different 
language extensions and their combinations typically ranges from polynomial 
to exponential space (see [GMN08J for a survey) . The main distinction between 
these works and the work presented here is that we focus on counting nodes 
located along deep and recursive paths in trees. 

When considering regular tree languages instead of regular string languages, 
succinct syntactic sugars such as the ones presented above are even more useful, 
as branching makes the situation more combinatorial compared to strings. In 
the case of trees, it is often useful to express cardinality constraints not only on 
the sequence of children nodes, but also in a particular region of a tree: in a 
subtree for example. Suppose for instance that we want to define a tree language 
over S where there is no more than 2 "b" nodes. This seems to require a quite 



RR n° 7251 



A Tree Logic with Graded Paths and Nominals 



large regular tree type expression such as the one below: 
a^root -* blxb<il |c[x6<2] |a[a::6<2] 

I X^b,alXb<2l,X^b I X^b,clXb<2l,X^b I Xb<l 

Xb<i -^ x^b\x^b,blx^bl,x^b\alxb<i'i \clxb<i1 
x^b -^ (alx^b'i I c[x^f,])* 

where Xroot is the starting non-terminal; x^b,Xb<i,Xb<2 are non-terminals; and 
the bracket notation a Ix^b^ describes a subtree whose root is labeled a and in 
which there is no b node. 

More generally, the widely adopted notations for regular tree grammars pro- 
duce very verbose definitions for properties involving cardinality constraints on 
the nesting of element^ 

The problem with regular tree (and even string) grammars is that one is 
forced to fully expand all the patterns of interest using concatenation, union, 
and Kleene star. Instead, it is often tempting to rely on another kind of (formal) 
notation that just describes a simple pattern and additional constraints on it. 
For instance, one could imagine denoting the previous example as follows, where 
the additional constraint is described using XPath notation: 

{x^{alx1 |5[x] |c[x])*) A count(/descendant-or-self::6) < 2 

Although this kind of counting operators does not increase the expressive 
power of the regular tree grammars, they can have a drastic impact on succinct- 
ness, thus making reasoning over these languages harder (as noticed in |Gel08) 
in the case of strings). Indeed, reasoning on this kind of extensions without 
relying on their expansion (in order to avoid syntactic blow-ups) is often tricky 
|GGM09] . Determining satisfiability, containment, and equivalence over these 
classes of extended regular expressions typically require involved algorithms with 
extra-complexity [MS 72) compared to plain vanilla regular expressions. 

In the present paper, we propose a logical notation that happens to be es- 
pecially appropriate for describing many sorts of cardinality constraints on the 
frequency of occurrence of nodes in regular tree types. Regular tree types en- 
compass most of XML types (DTDs, XML Schemas, RelaxNGs) used in practice 
today. 

XPath is the standard query language for XML documents, and it is an 
important part of other XML technologies such as XSLT and XQuery. XPath 
expressions are regular path expressions interpreted as sets of nodes selected 
from a given context node. In contrast with regular tree types, which only ex- 
press properties on children nodes, most of the expressive power of XPath comes 
from the ability to perform multidirectional navigation, that is, XPath expres- 
sions are able to express properties involving not only recursive navigation, as 



^This is typically the reason why the standard DTD for XHTML does not syntactically 
prevent the nesting of anchors, whereas this nesting is actually prohibited in the XHTML 
standard. 
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Figure 1: n-ary to binary trees 

for descendant nodes for instance, but also backward navigation, as for ances- 
tor nodes. Unfortunately, expressing cardinality restrictions on nodes accessible 
by recursive multidirectional paths may introduce an extra-exponential cost 
|GR05[ ltCM09| . or may even lead to undecidable formalisms |tCM09[ IDL06] . 
We propose in this paper a decidable framework capable of succinctly express 
cardinality constraints along deep multidirectional paths. 

Contribution and Outline We introduce a tree logic with counting oper- 
ators for expressing arbitrarily deep and recursive counting constraints in Sec- 
tion [2] A sound and complete algorithm for testing satisfiability of logical for- 
mulas in exponential time is presented in Section [3j Section |4] shows how the 
logic and the algorithm can be applied in the XML setting and in particular 
for the static analysis of XPath expressions and common schemas containing 
constraints on the frequency of occurrence of nodes. Finally, we review related 
works in Section [5] before concluding in Section [6] 

2 Counting Tree Logic 

We first present trees that we consider, and define a notion of trails in trees, 
before introducing the syntax and semantics of logical formulas. 

2.1 Trees 

We consider finite trees which are node-labeled and sibling-ordered. Since there 
is a well-known bijective encoding between n-ary and binary trees, we focus 
on binary trees without loss of generality. Specifically, we use the encoding 
represented in Figure [l] where the binary representation preserves the first child 
of a node and append sibling nodes as second successors. 

We consider the modalities "v" and ">". The modality "v" labels the edge 
between a node and its first child. The modality "O" labels the edge between 
a node and its next sibling. We also consider the converse modalities "A" and 
"< " that respectively labels the same edges in the reverse direction. 
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In order to define a simple set theoretic semantics for the logic, we consider 
trees in a way similar to Kripke structures for modal logics |Var98| . Specifically, 
we name M = {v, >,A,<} the set of modalities. For to e M we denote by to 
the corresponding inverse modahty (v = A,[> = <,A = V,< = >)• We consider 
a countable alphabet P of propositions representing names of nodes. A node is 
labeled with exactly one proposition. 

A tree can then be seen as a tuple {N,R,L), where: iV is a finite set of 
nodes; i? is a partial mapping from N x M to N that restricts the labeling of 
edges to form a tree structure; and L is a labeling function from N to P. 

2.2 Trails 

Trails are defined as regular expressions formed by modalities, as follows: 

ao "=TO I ao,ao I "o i ^o 
a ::= cko I aS'^o 

We restrict trails to sequences or repeated subtrails (which contain no repetition) 
followed by a subtrail (with no repetition). We also disallow trails of the form 
TO, TO, which may result in formulas with cycles. 

The syntactic interpretation of trails corresponds to sets of sequences of 
modalities (as in the usual semantics of regular expressions) . 

In a given tree, we say that there is a trail a from the node hq to the node 
Uk, written no — > n^, if and only if there is a sequence of nodes noi • • • , "-fc and 
a sequence of modalities mi , . . . , ruk that belongs to the syntactic interpretation 
of the trail a, such that R{nj,mj+i) = rij+i, where j = 0, . . . , fc - 1. We say that 
a path p among two nodes belongs to a trail a, written p e a, if there exists a 
sequence of modalities between the nodes that belongs to the interpretation of 
the trail. 

2.3 Syntax of Logical Formulas 

The syntax of logical formulas is given in Figure [2J where m e M and k e N. 
The syntax is shown in negation normal form, which can be reached usual De 
Morgan rules together with rules given in Figure [3J The fact that the semantic 
interpretation is preserved even though the smallest fixpoint does not become 
a greatest fixpoint is a consequence of Lemma |2.1[ 

Defining an equality operator for counting formulas is straightforward. 

{a)=o^= (a)<oV' 

2.4 Semantics of Logical Formulas 

Formulas are interpreted as sets of nodes in a tree. A model of a formula is a 
tree, such that the formula denotes a non-empty set of nodes in this tree. A 
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4, 





formula 


T 1 .T 


true, false 


P 1 -P 


atomic prop (negated) 


X 


recursion variable 


4>y 4> 


disjunction 


4> a4> 


conjunction 


{m)(j} 1 ^(to)t 


modality (negated) 


(a)<feV' 1 (a)>feV' 


counting 


/iX-V' 


fixpoint operator 


T 1 .T 




P 1 -P 




a; 




1/1 V -0 




■0 A -0 




(m)V' 1 -(m)T 




pLX.ij: 





Figure 2: Syntax of Formulas (in Normal Form). 



-.{m)(j) = -^{m)j V {m)^4> -•fix.ip = /ix.-ii/'{%2:} 

M)<k^={oi)>k^ -(a)>fe0E(a)<feV 

Figure 3: Reduction to Negation Normal Form. 
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\[-T]\v = 

Mv = {n,L{n)=p} 

l-p]\v = {n,L{n)+p} 

[My = {n,{n,x)eV} 

[[{mMv = {n,i?(n,m)6[[0]]^} 

[[^(r7?,)T]]y = {n,i?(n,?7i) undefined} 

[[{c^Unv = {nMn'e\[nv\n^n'}\<k} 

Figure 4: Semantics of Formulas. 

counting formula (a)yj^ip is interpreted as follows: the set of nodes such that 
there are at least fc + 1 nodes satisfying V' through the trail a. For example, the 
formula pi a ( v)( l>*)>5P2, denotes pi nodes with strictly more than 5 children 
nodes named p2 ■ 

In order to present the formal semantics of formulas, wc introduce valuations. 
Given a tree, a valuation T^ is a binary relation between tree nodes and variables. 
We write V^[^/r], where N' is a subset of the nodes, for the relation denoted 
by V extended with {n,x) for every n € N' . Given a tree T = {N,R,L) and a 
valuation V, the formal semantics of formulas is given in Figure [4J 

Intuitively, the formulas are interpreted as sets of nodes in a tree: proposi- 
tions denote the nodes where they occur; negation is interpreted as set comple- 
ment; disjunction and conjunction are respectively set union and intersection; 
the least fixpoint operator performs finite recursive navigation; and the counting 
operator denotes certain nodes, named the source nodes, such that the nodes, 
accessible from a single source through a trail, fulfill a cardinality restriction. A 
formula is said to be satisfiable when its interpretation is not empty. 

2.5 Restriction over Formulas 

We consider a syntactic restriction over formulas similar to the one in JGLS07] : 
every formula of the logic must be cycle-free (so that the logic is closed under 
negation |GLS07j ). Intuitively, in a cycle- free formula, fixpoint variables do not 
occur in the scope of both a modality and its converse. For example, cycle-free 
trails are trails where both a subtrail and its converse do not occur under the 
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scope of the recursion operator. We do not consider counting formulas under 
fixpoints nor under counting formulas. 

Lemma 2.1. Let (f) be a cycle-free formula, and T be a tree for which [[0II0 + 0. 



Then there is a finite unfolding (j)' of the fixpoints of (p such that H^'I^Vm^ ^*}II 



Proof. As counting formulas may be replaced by non-counting formulas (with 
the cost of an exponential blow up) , the proof is identical to the one in JGLS07] . 

D 

2.6 Global Counting Formulas and Nominals 

An interesting consequence of the inclusion of backward axes in trails is the 
ability to reach every node in the tree from a given node of the tree, using 
the trail (^|< )*, (v| >)*P1 We can thus select some nodes depending on some 
global counting property. Consider the following formula, where ^ stands for 
one of the comparison operators <,>,=. 

Intuitively, this formula considers each node n of the tree, and counts how many 
nodes in the whole tree satisfy (j)i . It then selects node n if and only if the count 
is compatible with the comparison considered. This formula thus returns either 
every node of the tree, or the empty set. It is then easy to restrict the selected 
nodes to some that satisfy a given formula 02, using intersection. 

((A|<)*,(v|>)*)#,0lA02 

This formula select every node satisfying 02 if and only if there are #fc nodes 
satisfying 0i, which we write as follows. 

01#fc =^ 02 

We can now express existential properties, such as "select all nodes satisfying 
02 if there exists a node satisfying 0i". 

01 > =^ 02 

We can also express universal properties, such as "select all nodes satisfying 02 
if every node satisfies 0i". 

(-0l)<O =^ 02 

Another way to interpret global counting formulas is as a generalization of 
the so-called nominals in the modal logics community |SV01| . Nominals are 
special propositions whose interpretation is a singleton (they occur exactly once 

^Note that this trail is cycle-free. 
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in the model). They come for free with the logic. A nominal, denoted "@n" in 
the remaining part of the paper, corresponds to the following global counting 
formula: 

((A|<)*,(v|»*Un 

where n is a new fresh atomic proposition. 

Notice that we can also perform a navigation to everywhere in a tree with 
only fixpoint formulas, hence a nominal can be alternatively written as: 

@n E 71 A ^[descendant(n) v ancestor(7i)v 
deso-or-self ( siblings ( n ) ) v 
deso-or-^elf ( siblings ( ancestor ( n ) ) ) ] , 

where: 

descendant((/)) = {■^)ij,x.(pv {x/)xv {t>)x 

foll-sibling(0) = fj.x.{\>)4>\/ {\>)x 

prec-sibling(0) = /i2:.(< )0 v (< )a; 

desc-or-self((/)) = /ia:o.(/> v ( v)A*a;i.xo v ( [>)a;i 

ancestor(0) = ^x.{'^){(j)\/ x) v {<l)x 

siblings(0) = fol]-sibling(0) v prec-sibling(0) 

2.7 Graded Paths 

Graded modalities have been introduced to count immediate successor nodes 
in graphs |KSV02) . Specifically, graded modalities make it possible to restrict 
the number of occurrences of immediate successors of a node in a graph by 
the mean of an explicit constant upper-bound and/or lower-bound. Here we 
consider trees and extend the "immediate successor" notion to nodes reachable 
from any regular path, including reverse and recursive navigation. 

A peculiarity of graded modalities in graphs is that they can be used inside 
recursive formulas. A similar notion in trees consists in counting immediate 
children nodes, as performed by the counting formula (v)( [>*)jtfe0, where (j) 
describes the property to be counted. It is then possible to consider occurrences 
of this counting formula inside a fixpoint operator. This is because this pecu- 
liar counting formula can be simply rewritten in terms of plain vanilla logical 
formulas. For instance, the formula (v)( [>*)>iP states the existence of at least 
two "p" children, and is translated into: 

{v)tJ-x.{p A {v)ny.p V ( \>)y) V ( \>)x 

The general nesting scheme of this translation can be expressed as follows, 
where the function ch(-) takes such a counting formula as input and returns 
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its translation: 

ch(( v)( >*)>o0) ={v)^ix.(j) V ( D>)a; 
ch((v)(>*)>fe+i0)=(v)Ma;.(0Ach((v)(>*)>fe0))v(>)a: 
ch((v)(>*),,0)=^ch((v)(>*)>fc0) 

We can even apply a recursive version of this transformation in order to rewrite 
nested counting formulas. 

In Lemma |3.12[ we show the computational cost of the translation does 
not depend on the size of the formula, but on the nesting level of counting 
subformulas. 

The possibility of using an arbitrary fixpoint operator around a given formula 
allows one to express the "until" operator, proposed for XPath by Marx |Mar05j . 
Owing to the previous translation, we can combine counting features with the 
"until" operator and express properties that go beyond the expressive power of 
the XPath 1.0 standard. For instance, the following formula states that "starting 
from the current node, until we reach an ancestor named a, every ancestor has 
at least 3 children named 5" : 

/ix. (( v)( >*)>2& A /ij/.( A)a; V (< )y) v a 

3 Satisfiability Algorithm 

We present a tableau-based algorithm for checking satisfiability of formulas. 
Given a formula, the algorithm seeks to build a satisfying tree. A satisfying 
tree is found if and only if the formula is satisfiable, otherwise the algorithm 
concludes that the formula is unsatisfiable. 

3.1 Overview 

The algorithm operates in two stages. 

First, a formula (p is decomposed into a set of subformulas, called the Lean. 
The Lean gathers all subformulas that are useful for determining the truth status 
of the initial formula, while eliminating redundancies. For instance, conjunctions 
and disjunctions are eliminated at this stage, since, if a subformula (pi holds 
then one does not need to know the truth status of (j)2 in order to determine 



the truth status of 4>i v (f)2. In fact, the lean (defined in 3.2) only gathers 
atomic propositions and modal subformulas. The Lean defines a finite number 
of formulas that can be composed. The set of all these compositions represents 
the exhaustive search universe in which the algorithm is looking for a satisfying 
tree. A tree node corresponds to a valuation of the Lean formulas. 

The second stage of the algorithm consists in a least fixpoint computation 
that builds every relevant binary tree in a bottom-up manner. At the first step of 
this stage, all possible leaves are considered. At each further step, the algorithm 
considers every possible parent node that can be connected with a node of the 
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previous steps. At each step, built subtrees are checked for consistency: for 
instance if a formula at a node n involve a forward modality ( v)0', then (j)' must 
be verified at the first child of n. Reciprocally, due to converse modalities, a 
given node may impose restrictions on its possible parent nodes. The algorithm 
only considers consistent nodes at each step, meaning that the whole subtree of 
a given node added at a given step provably satisfies a subformula, except its 
potential top-level backward modalities that will be taken into account at the 
next step. At each step, counting formulas are verified. Finally, the algorithm 
terminates whenever: 

• either a tree that satisfies the initial formula has been found, and its root 
does not contain any pending (unprovcn) backward modality; or 

• no more parent nodes can be considered (the exploration of the whole 
search universe is complete): the formula is unsatisfiable. 

The algorithm is proven sound and complete: (j) is satisfiablc if and only if 
a tree in which (j) is satisfied at some node is built. Thus either such a tree is 
built, or (f) is not satisfiable. 



3.2 Preliminaries 

We first annotate every counting 
written {a)'^f,(j). We first formally 
end, we first need to extract navit 



formula with a fresh counting proposition c, 
define the notions of Lean and nodes. To this 
jating formulas from counting formulas. 





nav{x) 


= X 




nav{p) 


= P 




nav{j) 


= T 




nav{c) 


= c 




nav{^p) 


= -P 




nav{^{m)j) 


= ^(m)T 




nav{(j}i A (f>2) 


= nav{(f>i) A nav{(f>2) 




nav{(j}i \/ 4)2) 


= nav{(f>i) V nav{(j)2) 




nav{{m)(j)) 


= {m)nav{(j)) 




nav{fix.ip) 


= iix.nav{')p) 




nav{{a)lk%lj) 


= nav{{a),^p A c) 




naw((a)<feV) 


= nav{{a),{tp A c) v {^tp a -.c)) 




nou((e),V') 


= V 




nav{{m), ip) 


= {m)'ip 




nav{{ai,a2),ip) 


= nav{{ai) , nav{{a2) , "tp)) 




nav{{ai \ 02),^) 


= nav{{ai),ip) v nav{{a2),'ip) 




nav{{a*),'ip) 


= iix.nav{')p) V nav{{a),x) 
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We define the Fisher-Ladner relation among formulas as follow, where i = 
1,2. 

The Fisher-Ladner closure of a formula (j>, written FL{(f), is the set defined 
as follow. 

FL{4>)o = {0}, 

FL(0),+i = FL{cj)),vj{cj,'\R^\^",cj)'),^"eFL{^)i}, 

FL{<j>) = FL{<l))k, 

where k is the smallest integer s.t. FL{4))k = FL{(j))k+i. Note that this set is 
finite: fixpoints are only expanded once. 

The Lean set of a formula 4> includes navigating formulas of the form (m)T, 
every navigating formulas of the form {m)(f)' from the Fisher-Ladner closure, 
every proposition occurring in 0, written P^, every counting proposition, written 
C, and an extra proposition that does not occur in (p used to represent other 
names, written p^. 

Lean{(j)) = {{m)j} u {(m)0' e FL{<j))} uP^uCu {p^} 
A (f>-node , written n"^, is a non-empty subset of Lean{(j)), such that: 

• exactly one proposition from P^ u {pr} is in each 0-node; 

• when {m)(f)' e n"^, then (m)T e n*^; and 

• both ( a)t and (< )t cannot be in the same i/nnode. 

The set of (/)-nodes is defined as 7V"^. 

Intuitively, the formula corresponding to a node n*" is the following. 

When the formula (p under consideration is fixed, we often omit the super- 
script. 

A (ptree is either the empty tree 0, or a triple (ri'^,ri,r2) where Fi and F2 
are (/jtvees. 

We now turn to the definition of consistency of a 0tree. First, we define an 
entailment relation between a node and a formula in Figure [Sj 

We can now define the consistency relation between nodes of a (ptree. 
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V'l 



tp € n 


^i 


n 


n\-''' "ip 




-.^ 



n 1- 


"^1 


nn* 


' '4'2 




ni-'^V: 


A ■02 






n\-^ 


^{M- 


■%} 



n I-*" ^1 V ?/j2 



n \-''' "ipi y 1^2 IT, h'' /ia;.'0 

Figure 5: Local entailment relation: between nodes and formulas 

Two nodes rii and n2 are consistent under modality m e {v, >}, written 
i?'^(ni,TO) =7^2, iff 

y {rn)ip e Lean{(j)) , {m)il! e m ■<=^>- n2^ tp 

y {rn)ilj e Lean{ip) , {Wi)ip e n2 <=^' ni\-'^ ijj 

Consistency is checked each time a node is added to the tree, ensuring that 
forward modalities of the node are indeed satisfied by the nodes below, and 
that pending backward modalities of the node below are consistent with the 
added node. Note that do not check counting formulas at this point, as they 
are globally verified in the next step. 

Upon generation of a finished tree, i.e., a tree with no pending backward 
modality, one may check whether a node of this tree satisfies 0. To this end, we 
first define forward navigation in a (piree V . Given a path consisting of forward 
modalities p, r(p) is the node at that path. It is undefined if there is no such 
node. 

(n,ri,r2)(e) =n 

(n,ri,r2)(vp) = ri(p) 
(n,ri,r2)(>p) = r2(p) 

We also allow extending the path with backward modalities if they match the 
last modality of the path. 

(n,ri,r2)(pvA) = („,ri,r2)(p) 
(n,ri,r2)(p><) = (n,ri,r2)(p) 

Now, we are able to define an entailment relation along paths in (/)trees in 
Figure l6J This relation extends local entailment relation (Figure Is]) with checks 
for counting formulas. Note that the case for fixpoints is contained in the case 
for formulas with no counting subformula. Note also that -■'0 in the "less than" 
case denotes the negation normal form. 

We conclude these preliminaries by introducing some final notations. The 
root of a (/)tree is defined as follows. 

root{0) = 
root{{n,Ti,T2)) = n 
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h' does not contain counting formulas r(/?) i-*' cf)' p i-p 0i p i-p (/)2 



p i-p 01 P i-r 4>2 pm i-p 0' 



(0 1-^ (/)i V (/)2 P^r4'i^4'2 p \-f, {m)(j)' 

\{n' , p' e a A T{pp) = n' a n' i-'*' ^ a c}| > fc 
P^^W>feV' 

|{n', p' e a A T{pp) = n' a n' h"*" ^ a c}| < fc 
Vp' 6 a, T{pp') h"^ (V- A c) V (^V A -c) 

Figure 6: Global entailment relation (incl. counting formulas) 

We extend this notion to multiset of trees and write root{ST) for the multiset 
of roots of the trees of ST. 

The multiset of nodes of a tree is defined as follows. 

nodes{0) = 
nodes((n, Fi, F2)) = {n} u nodes{Ti) u nodes{T2) 

We also extend this notion to multiset of trees. 

A (/)tree F satisfies a formula 0, written F 1- 0, if neither (a)t nor (< )t 
occur in root(F), and if there is a path p such that F(p) = n and n\-'^ 6. 

A multiset of trees ST satisfies a formula 0, written ST \- 0, when there is 
a syntactic tree F e ST such that F 1- 0. 

3.3 The Algorithm 

We are now ready to present the algorithm, which is parameterized by K{(j)), 
the maximum number of occurrences of a given node in a path from the root 
of the tree to a leaf. It builds consistent candidate trees from the bottom up, 
and checks at each step if one of the built tree satisfies the formula, returning 
1 if it is the case. As the set of nodes from which to build the trees is finite, it 
eventually stops and returns if no satisfying tree has been found. 
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Algorithm 1 Check SatisfiabiHty of cj) 
ST^0 
repeat 

AUX <- {(n,ri,r2) I {we extend the trees} 

nmax(n,ri, r2) < K{(j)) + 2 {with an available node} 
for i in v, > {and each child is either} 
Ti = and {i)j i n {an empty tree} 
or Ti € ST {or a previously built tree} 

{i)j 6 root(Ti) {with pending backward modalities} 
R'^{n^i) = root{Ti)} {checking consistency} 
if AUX g ST then 

return {No new tree was built} 
end if 

ST^ STu AUX 
until ST\-(f) 
return 1 



K{p) = K{^p) = K{^{m)l) = K{j) = K{x) = 

K{(t>i A 02) = K{4>i V 02) = K{4>i) + K{4>2) 

K{{m)(j)) = K{^ix.(j)) = K{(j)) 

Figure 7: Occurrences bound 

We now define the auxiliary nmax function as follows, where max is the usual 
maximum function between integers. 

nmax(n, ri,r2) = iiiax(nmax(n,ri),nmax(n,r2)) 
nmax(n, (n, Fi, F2)) = 1 + nmax(Fi,r2) 
nmax(T7,, (n , Fi, F2)) = nmax(Fi,r2) if n + n 
nmax(n, 0) = 

Note a formula fj,x.(j) can be rewritten in an equivalent formula such that x 
in (p is only present in formulas with the form {m)x. With this last observation, 
we now define the parameter for the number of occurrence of the same node in 
the tree in Figure [7) 

Consider for instance the formula = pi a ( v ) ( >* )>iP2 • The computed Fean 
is as follows, where ip = iix.p2'^ {\>)x. 

{pi,P2,P3, ( v)t, ( [>)T, ( a)t, (< )t, ( v)i/^, ( O)^} 

Proposition P3 represents names other than pi and P2- We now compute the 
bound on nodes: K = 2. 
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(pi) (p-A (p-i) rp'i) 

Figure 8 : Checking = Pia(v)(I>*)>2P2 

After the first step, ST consists of the trees of the form ({pi},0,0) and 
(tej(j)T},0,0), with i 6 {1,2,3} and j e {v, >}• At this point the three 
finished trees in ST are tested and found not to satisfy (f>. 

After the second iteration many trees are created, but the one of interest is 
the following. 

ro = ({p2,(>)T,(A)T,(>)7/^},0,({p2,(<)T},0,0)) 

The third iteration yields the tree ({pi, (v)^,(v)T},ro,0), which is found 
to satisfy 4> at path e. As the nodes at every step are different, the limit is 
not reached. Figure |8] depicts a graphical representation of the example where 
counted nodes are drawn as thick circles. 

3.4 Termination 

Proving termination of the algorithm is straightforward, as only a finite number 
of trees may be built and the algorithm stops as soon as it cannot build a new 
tree. 

3.5 Soundness 

If the algorithm terminates with a candidate, we show that the initial formula 
is satisfiable. Let F, p the i/itree and path such that p i-j! (/). We extract a tree 
from F and show that the interpretation of for this tree includes the node at 
path p. 

We write T(F) for the tree {N, R, L) defined as follows. We first rewrite F 
such that each node n is replaced by the path to reach it. 

path{n,Ti,T2) -* (e,pat/i( v,Fi),pa</i( [>,F2)) 
path{p,{n,Ti, r2)) ^ (p,paift.(/9V, Fi),pat/i(p[>, F2)) 
path{p,0) -^ 

We then define: 
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u N = nodes{path{T)); 

• for every (p,ri,r2) in path{T) and i = v, >, if Fj # then R{p,i) = pi 
and R{pi, i) = p; and 

• for all p 6 A^ if p 6 T{p) then L{p) = p. 

Lemma 3.1. Let ip a subformula of (p with no counting formula. IfT{p) i-'^' ip 
then we have p € [[■0II0 ■ 

Proof. We proceed by induction on the lexical ordering of the number of un- 
folding of ip that are required for T(r), and of the size of the formula. 

The base cases are T, atomic or counting propositions, and negated forms. 
These are immediate by definition of |IV'I|0 ■ The cases for disjunction and 
conjunction are immediate by induction (the formula is smaller). The case for 
fixpoints is also immediate by induction, as the number of unfoldings required 

decreases, and as ^px.^Ye^^^ = IH^''-'^I]\1^^^ ■ 

The last case is the presence of a modality (to)V' from the 0node T{p). In 
this case we rely on the fact that the nodes T{pm) and T{p) are consistent to 
derive pm \-'^ ip. We then conclude by induction. D 

Theorem 3.2 (Soundness). If pf-f,(j) then pe \[(t)]\0^^^ 

Proof. We proceed by induction on the derivation of p i-p cj). 

The proof is a consequence of the more general result p' i-p 0' =^>- p' 6 
U^'JI^ for any subformula of 0', by induction on the derivation of r(p') i-p 
^' . If (p' has no counting formula, the result is immediate by Lemma |3.1| Most 
cases are immediate by induction. As concerns the case for counting formulas, 
each hypothesis n! 1-"^ f/iAc has as hypothesis n! i-"^ -0. This is enough to conclude 
by induction for the "greater than" case. For the "less than" case, every node 
that is not counted has to satisfy -.^ A -.c, so in particular -■?/;, and we conclude 
by induction. D 

3.6 Completeness 

Our proof proceeds in two step. We build a entree that satisfies the formula, 
then we proceed to show it is actually built by the algorithm. 

Assume that formula </> is satisfiable by a tree T . We consider the smallest 
such tree (i.e., the tree with the fewest number of nodes) and fix n*, a node 
witnessing satisfiability. 

We now build a (/)tree homomorphic to T, called the Lean labeled version of 
0, written r(T, (/)), and defined as follows. 

First, we annotate counted nodes along with their corresponding counting 
proposition, yielding a new tree T^. Starting from n* and by induction on t^, 
we proceed as follows. For formulas with no counting subformula, including 
recursion, we stop. For conjunction and disjunction of formulas, we recursively 
annotate according to both subformulas. For modalities, recursively annotate 
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from the node under the modality. For (a)J;j,'0, we annotate every selected node 
with the counting proposition corresponding to the formula. For {a)'^i.'ip, we 
annotate exactly fc + 1 selected nodes. 

We now extend the semantics of formulas to take into account counting 
propositions and annotated nodes, written [[-Jy . The definition is identical 
to Figure HI with one addition and two changes. The addition is for counting 
propositions, which we define as n e [[cjy iff n is annotated by c. The two 
changes are for counting propositions, which we define as follows, selected only 
nodes that are annotated. 



[[(«),fc0']]y^ = {", \{n' 6 ml^ n [[c]]^^ n ^ n'}\ < k} 
\[H>k<f>%^ = {n, \W e U%^ n [[c]]^^ n ^ n'}\ > fc} 

We show that this modification of the semantics does no change the satisfi- 
ability of the formula. 

Lemma 3.3. We have n* e H'/'Ilg'". 

Proof. We proceed by recursion on the derivation n* e \[(f>J\0. The cases where 
no counting formula is involved, thus including fixpoints, are immediate, as 
the selected nodes are identical. The disjunction, conjunction, and modality 
cases are also immediate by induction. The interesting cases are the counting 
formulas. 

For {a)1f.'ip, as there are exactly fc+1 nodes annotated, the property is true by 
induction. For {a)'^f.ip, wc rely on the fact that every counted node is annotated. 
We conclude by remarking that ^ does not contain a counting formula, thus we 
]l^ = ml^ and l^^v^ = l^m. 



have IV']]?) = ml and l^^l^ = 1^^^. D 



To every node n, we associate n*^, a subset of formulas of the Lean selecting 
the node. 

n"^ = {(/)o\ne [[0OI0, <t>o e Lean{(j))} 

Note that this is a (/<-node as it contains one and exactly one proposition, 
and if it includes a modal formula {m)ip, then it also includes {m)j. 

The tree T{T, (j)) is then built homomorphically to T. 

In the remainder of this section, we write F for r(r, </>). We first check that 
F is consistent, starting with local consistency. 

In the following, we say a formula tp is induced by the lean of 0, written 
if) 6 Lean{(j)), if it consists of the conjunction and disjunction of formulas from 
the lean as defined in Figure |9] 

Lemma 3.4. Let {m)tjj be a formula in Lean{(j)), and let ip' be ip after unfolding 
its fixpoint formulas not under modalities. We have ip' e Lean{(j)). 

Proof. By definition of the lean and of the e relation. D 

Lemma 3.5. Let ip be a formula induced by Lean{(j>). We have n e \[ipj\0'' if 
and only if n*" i-*" ip. 
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^ € Lean{(j)) ■01 e Lean{(j)) -02 s Lean{(j)) 

ip € Lean{(j)) ipi /\'ip2 ^ Lean((j)) 

ipi e Lean{4>) %l;2 ^ Lean^cf) V e (P^ u (m)T u C) 

■01 V 02 s Lean((f>) T e Lean{(j)) -.0 e Lean{<j)) 

Figure 9: Formula induced by a lean 

Proof. We proceed by induction on 0. The base cases (the formula is in the 
0-node or is a negation of a lean formula not in the 0-node) hold by definition 
of n*^. The inductive cases are straightforward as these formulas only contain 
fixpoints under modalities. D 

Lemma 3.6. Let ni and n2 such that R(ni,m) = 712 with m € {v, >}• We 
have R'''{nf,m) = n^- 

Proof. Let {m)ilj be a formula in Lean{<j)). We show that {m)tp e nf 



n. 



2 



0. We have {m)ip e nf if and only if ni e [[(?ti)0]]0^ by definition of nf, 
which in turn holds if and only if 712 = R{ni,m) e [[V^]]0^. We now consider ■0' 
which is after unfolding its fixpoint formul as no t under modalities. We have 
[[^'110'' = [[V'll0^ and we conclude by Lemmas 



3.4 



and 



3.5 



n 



We now turn to global consistency, taking counting formulas into account. 
Lemma 3.7. Let 0s be a subformula of 0, and p be a path from the root in T 



such that T{p) e [[0^110'' • We then have p i 



Proof. We proceed by induction on 0s . 

If 0s does not contain any counting formula, we consider 0^ which is 0s ofter 
unfolding its fixpoint formulas not under modalities. We have [[0sl|0^ = [I0sllu'' 
and 0s 6 Lean(0). We conclude by Lemma 



3.5 



For most inductive cases, the proof is immediate by induction, as the formula 
size decreases. 

For (a)^j,0, we have by induction form every counted node T{pp') i-"^ ip and 
T{pp') 1-"^ c. We conclude by the conjunction rule and by the counting rule of 
Figure [6j 

For (a)^j,'0, we proceed as above for the counted nodes. For the nodes that 
are not counted, have [[-'V'lly = [["'V'llv and by soundness, we have r(pp') i-"^ 
-lip. We conclude by remarking that the node is not annotated by c, hence 
T{pp') H't' -.c. D 

We now need to show that the 0tree F is actually built by the algorithm. 
The proof that it is the case follows closely the one from [ GLS07| , with a crucial 
exception: we need to make sure there are enough instances of each formula. 
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Indeed, in |GLS07| . the algorithm uses a (^type (a subset of Lean{(j))) at most 
once on each branch from the root to a leaf of the built tree. This yields a 
simple condition to stop the algorithm and conclude the formula is unsatisfiable. 
However, in the presence of counting formulas, a given 0type may occur more 
than once on a branch. To maintain the termination of the algorithm, we bound 
the number of identical 0type that may be needed by K{(j)) as defined in Figure 
[7) We now check that this bound is sufficient to build a tree for any satisfiable 
formula. 

We recall that is a satisfiable formula and T is a smallest tree such that </> 
is satisfied, and n* is a witness of satisfiability. 

We proceed in two steps: first we show that counted nodes (with counted 
propositions) imply a bound on the number of identical 0types on a branch for 
a smallest tree. Second, we show that this minimal marking is bound by K{(j)). 

In the following, we call counted nodes and node n* annotations. 

We now define the projection of an annotation on a path. Let p be a path 
from the root of the tree to a leaf. An annotation projects on p at pi if p = pip2, 
the annotation is at piPm, and p2 shares no prefix with /9,„. 

Lemma 3.8. Let T' be the annotated tree, p a path from the root of the tree 
to a leaf, ni and ^2 two distinct nodes of p such that nf = nl^. Then either 
annotations projects both on p at ni and n2, or an annotation projects strictly 
between ni and 712 . 

Proof. We proceed by contradiction: we assume there is no annotation that 
projects between ni and n2 and at most one of them has an annotation that 
projects on it. Without loss of generality, we assume that n2 is below rii in the 
tree. 

Assume neither ni nor ^2 is annotated (through projection). We show that 
the tree where R{ni, v) ^ R{n2, v) and R{ni, >) ^ i?(ri2, >) still satisfies (j) at 
n, a contradiction since this tree is strictly smaller. Let Tg be this smaller tree, 
Tg the corresponding 0tree, and for every path p of F, let ps be the potentially 
shorter path if it exists (i.e., if it was not removed when pruning the tree). More 
precisely, let pi be the path to rii and pip2 be the path to 712. If p' = p'lp':^ where 
p'l is a prefix of pi and the paths are disjoint from there, then Ts(p') = T(p'). 
If p' = P1P2P3, then Tsipips) = F(p'). 

First, as there was no annotation projected, n is still part of this tree at a 
path Ps. We show that we have ps i-p (j> by induction on the derivation p i-p (j>. 

Let p' Hp (j)' in the derivation, assuming that p'^ is defined. 

The case where (j)' does not mention any counting formula is trivial: F(p') = 
Ts{p's) thus local entailment is immediate. 

Conjunction and disjunction are also immediate by induction. 

For the modality case, we first need to prove an additional property. If 
p' Hp {m)(j)' and (/)' contains a counting formula, then p'm is either a prefix 
of pi followed by a disjoint path, or it includes piP2- We prove this property 
by contradiction. The formula {m)(j)' is both in r(pi) and in T{pip2). We 
consider the outermost counting formula in (f)' which we write 4>'^. It presence 
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implies the occurrence of a counting proposition c in the formula. Since counting 
propositions are distinct for distinct syntactic occurrences of a formula, this 
implies that the corresponding counting proposition is either under a fixpoint 
(which is impossible), or under an enclosing counting formula, which is also 
impossible. We thus have a contradiction 

We now turn to the counting case (a)^^.';/'. We say that a path does not 
cross over when this path does not contain rii nor n2. For nodes that are 
reached using paths that do not cross over, we conclude by induction that they 
are also counted. We now show that the remaining nodes for which a crossover 
happened are also reached. Without loss of generality, assume that p' is a prefix 
of pi (the counting formula is in the "top" part of the tree), and let /?„ be the 
path from the counting formula to the counted node (p„ is an instance of the 
trail a). This path is of the shape p'^piPc, with px = p'p'i- We now show that 
the path p'lPc is an instance of a if and only if /?„ is, thus the same node is still 
counted. 

Recall that a is of the shape ai, . . . , «„, a„+i where ai to a„ are of the form 
a* and where a„+i does not contain a repeated trail. We say that a prefix 
Pp of a path p stops at i if there is a suffix ps such that ppP^ is still a prefix 
of p, if PpPs £ Q!i, . . . ,ai, and if there is no shorter suffix p'^ and j such that 
Ppp's € Qfi, . . . , Qfj. (Intuitively, ai is the trail being used when matching the end 
of Pp.) Note that i may not be unique as a path may be matched in different 
ways by a trail. We now show that there are i < j < n such that both p[ stops 
at i and p'iP2 stop at j. We thus show that j cannot be n + 1. Recall that 
a„+i does not contain a repeated subtrail. If rij does not contain the counted 
proposition c (which may happen in the case of a "less than" counting where 
the target is not counted), then neither does nf, which is a contradiction to 
the fact that ai, . . . ,a„+i is not empty (in that case the counted proposition is 
necessarily mentioned). Thus n^ contain formulas without a fixpoint (as the 
trail is not repeated) mentioning c. Consider the largest such formula. By an 
induction on the path p2, we build a strictly larger formula that occurs in nj*. 
This a contradiction to the hypothesis that n{ = n"^- 

We now consider the suffixes p] and p^ computed when stating that the 
paths stop at i and j. These suffixes correspond to the path matching the 
end of ai and aj, respectively (before the next iteration or switching to the 
next formula). They have matching formulas in nf and n"^- As the formulas 
are present in both nodes, then the remainder of the paths {p2Pc and pc) are 
instances of {pl\p'^)ai . . . ««+!, thus p'lPc is an instance of a if and only if p„ is. 

In the case of "greater than" counting, we conclude immediately by induction 
as the same nodes are selected (thus there are enough) . In the case of "less than" , 
we need to check that no new node is counted in the smaller tree. Assume 
it is not the case for the formula (q:)<j.'0, thus there is a path p„ e a to a 
node satisfying ip. As the same node can be reached in F, and as we have 
T{p'pn) H*" -1-0 by induction, we have a contradiction. 

This concludes the proof when neither ni nor n2 is annotated. The proof 
is identical when n2 is annotated. If ni is annotated, we look at the first 
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modality between ni and 712. If it is a v, then we build the smaller tree by 
doing R{ni, v) <- R{n2, v) (we remove the > subtree from 712 instead of ni). 
Symmetrically, if the first modality is a >, we consider R{ni, O) <- R{n2, I>) as 
smaller tree. The rest of the proof proceeds as above. D 

Theorem 3.9 (Completeness). If (f) is satisfiable, then a satisfying tree is built. 

Proof. The proof proceeds as in |GLS07| , we only need to check there are enough 
copies of each node to build every path. Let p be a path from the root of the tree 



to the leaves. By Lemma 3.8 there are at most n+ 1 identical nodes in this path, 
where n is the number of marks. The number of marks is c + 1 where c is the 
number of counted nodes. We show by an immediate induction on the formula 
(j) that c is bound by K{(j)) as defined in Figure [7J We conclude by remarking 
that K{(p) + 2 is the number of identical nodes we allow in the algorithm. D 

3.7 Complexity 

We now show that the complexity of the satisfiability algorithm is exponential 
time w.r.t. the formula size. This is achieved in two steps: we first show that 
the Lean size is linear w.r.t. the formula size, then we show that the algorithm 
has a single exponential complexity w.r.t. to the Lean size. 

Lemma 3.10. The Lean size is linear in terms of the original formula size. 

Proof Sketch. It was shown in |GLS07) that the Lean size of non counting for- 
mulas is linear with respect to the formula size. 

We now describe the case for counting formulas. Note that each counting 
formula introduces only one new counting proposition in the Lean. A first 
duplication of formulas is considered in the construction of the Lean for "less 
than" counting formulas. Both, the formula witnessing the counted nodes and 
its negation are considered. Furthermore, another duplication is introduced 
for counting formulas of the form {oii\a2)jik^- Each of these duplications only 
doubles the size of the Lean. Hence, the Lean size remains linear w.r.t to the 
original formula size. D 

Theorem 3.11. The satisfiability algorithm for the logic is decidable in time 
20{n) ^ ^fig^Q ji Ig ^/jg jjgan size. 

Proof Sketch. The cardinality of nodes set is 2". The number of occurrences 
of each node in the tree is bounded by K{(l)) < k * m, where k is the greatest 
constant occurring in the counting formulas and m is the number of counting 
subformulas. Hence the number of steps in the algorithm is bounded by 2"*k*m. 

As for the functions at each step, nmax is a single traversal to the tree. 
Since the entailment relation involved in the definition of R'^ is only local, R"^ 
is performed in linear time. 

The number of choices to form trees (triples) at each step is restricted by 
3* (2" *k*m). 
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The global entailment relation involves four exponential time traversals: the 
number of trees, the number of nodes at each tree, the number of traversals 
for the entailment relation of counting formulas, and the cost of each of such 
traversals. Hence it takes no more than 4 * (2" * k) time. 

n 



Theorem 3.11 states the complexity for the logic defined in Figure [2J We now 



state that the same complexity upper-bound holds if we additionally consider 
counting formulas of the form ( v)( l>*)4tfe'/' i^ ^^^ scope of a fixpoint operator 



(as presented in Section 2.7) 



Lemma 3.12. Given a formula (j) where counting subformulas ^ only count 
children nodes, if every counting subformula tp is replaced by the equivalent 
fixpoint formula ch{'ip) in (j), '/'['^''^'''Via]; then Lean{(f)['^^'^'^'> j^l) < Lean{<p) * h!" , 
where k is greatest numerical constraint of the counting subformulas, and I is 
the greatest level nesting of counting subformulas. 

Proof Sketch. It is proven by induction on the structure of (/>, and in the case of 
counting formulas, another induction is done on the numerical constraint. D 

Corollary 3.13. The logic supporting counting formulas only on children in the 
scope of fixpoint formulas or another counting formula is decidable in 2^"* ' , 
where k is the greatest cardinality constraint and I is the greatest nesting level 
of counting formulas. 

Proof. Immediate from Theorem |3.11| and Lemma |3.12| D 

4 Application to XML Trees 

4.1 XPath Expressions 

XPath |CD99] was introduced as part of the W3C XSLT transformation lan- 
guage to have a non-XML format for selecting nodes and computing values from 
an XML document (see |GLS07) for a formal presentation of XPath) . Since then 
XPath has become part of several other standards, in particular it forms the 
"navigation subset" of the XQuery language. 

In their simplest form XPath expressions look like "directory navigation 
paths" . For example, the XPath 

/compEiny /personnel/employee 

navigates from the root of a document through the top-level "company" node 
to its "personnel" child nodes and on to its "employee" child nodes. The result 
of the evaluation of the entire expression is the set of all the "employee" nodes 
that can be reached in this manner. At each step in the navigation, the selected 
nodes for that step can be filtered with a test. Of special interest to us are the 
predicates that test node's count or the selected node's position in the previous 
step's selection. For example, if wc ask for 
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/compciny /personnel/employee [posit ion 0=2] 

then the resuh is all employee nodes that are the second employee node among 
the employee child nodes of each personnel node selected by the previous step. 
XPath also makes it possible to combine the capability of searching along 
"axes" other than the shown "children of" with counting constraints. For ex- 
ample, if we ask for 

/company [count (descendant : : employee<=300)] /name 

then the result consists of the company names with less than 300 employees in 
total (the axis "descendant" is the transitive closure of the default - and often 
omitted - axis "child" ) . 

The syntax and semantics of Core XPath expressions are respectively given 
on Figure [To] and Figure [TT] An XPath expression is interpreted as a relation be- 
tween nodes. The considered XPath fragment allows absolute and relative paths, 
path union, intersection, composition, as well as node tests and qualifiers with 
counting operators, conjunction, disjunction, negation, and path navigation. 
Furthermore, it supports all XPath axes allowing multidirectional navigation. 



Axis ::=self | child | parent | descendant | ancestor | 
following-sibling | preceding-sibling | 
following I preceding 
NameTest ::=QName | * 

Step ::=Axis::NameTest 
PathExpr ::=PathExpr/PathExpr | PathExpr[Qualifier] | Step 
Qualifier ::=PathExpr | CountExpr | not Qualifier | 

Qualifier and Qualifier | Qualifier or Qualifier | @n 
CountExpr ::=count(PathExpr ) Comp k 

PathExpr' ::=PathExpr7PathExpr' | PathExpr' [Qualifier'] | Step 
Qualifier' ::=PathExpr' | not Qualifier' | Qualifier' and Qualifier' 
I Qualifier' or Qualifier' | @n 
Comp::= <|>|>|<|= 
XPath ::=PathExpr | /PathExpr | XPath union PathExpr | 

XPath intersect PathExpr | XPath except PathExpr 

Figure 10: Syntax of Core XPath Expressions. 

It was already observed in |GR05[ ItCMOQ) that using positional informa- 
tion in paths reduces to counting (at the cost of an exponential blow-up). For 
example, the expression 

child: : a[position()=5] 
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|Axis::NanicTcst] ={{x,y) e N^ \ x(Axis)y and 
y satisfies NameTest} 
I/PathExpr] ={(r,j/) e |PathExpr] | 
r is the root} 
IP1/P2I =[Pil ° [P2I 
iPi union P2I =IPil U [P2I 
iPi intersect P2I =[Pil n IP2] 
[Pi except P2I =lPil X IP2I 
|PathExpr[Qualifier]] ={{x,y) e |PathExpr] | 
ye [Qualifier] Quaiif} 

[PatliExprlQuaiif ={a; | 3y.ix,y) e [PatliExpr]} 
[count(PathiExpr) Comp fcjquaiif ={x e N \ 

|{y|(a;,y)6[PatlrExprl}| 
satisfies Comp k} 

[not QlQualif =N \ [Qlqualif 
[Ql and Q2lQualif =[QllQualif n [Qllqualif 
[Ql or Q2lQualif =[Q2lQualifU [(52lQualif 

Figure 11: Semantics of Core XPath Expressions 
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first selects the "a" nodes occurring as children of the current context node, and 
then keeps those occurring at the 5th position. This expression can be rewritten 
into the semantically equivalent expression: 

child: : a [count (preceding-sibling: :a)=4] 

which constraints the number of preceding siblings named "a" to 4, so that the 
qualifier becomes true only for the 5th child "a" . A general translation of posi- 
tional information in terms of counting operators [GR05[ |tCM09 is summarized 



on Figure 12 where « denotes the document order (depth-first left-to-right) re- 
lation in a tree. Note that translated path expressions can in turn be expressed 
into the core XPath fragment of Figure Ko\ (at the cost of another exponential 
blow-up). Indeed, expressions like PathExpr/(PathExpr2 except PathExpr3)/PathExpr4 
must be rewritten into expressions where binary connectives for paths occur only 
at top level, as in: 

PathExpr/PathExprj /PathExpr4 except 
PathExpr/PathExpr3/PathExpr4 



PathExpr[position() = 1] sPathExpr except (PathExpr/ «) 
PathExpr[position() = fc + 1] ^(PathExpr intersect 

(PathExpr[fc]/«))[position() = l] 
«s(descendant::*) union (a-o-s::*/ 
following-sibling::*/d-or-s::*) 
a-or-s::* ^ancestor::* union self::* 



Figure 12: Positional Information as Syntactic Sugars jGR05|, ltCM09| 
We focus on Core XPath expressions involving the counting operator (see 



Figure 10). The XPath fragment without the counting operator (the naviga- 
tional fragment) was already linearly translated into /i-calculus in IGLS07| . The 
contributions presented in this paper allow to equip this navigational fragment 
with counting features such as the ones formulated above. Logical formulas 
capture the aforementioned XPath counting constraints. For example, consider 
the following XPath expression: 

child : : a [count (descendant : : b [parent : : c] ) >5] 

This expression selects the children nodes named "a" provided they have more 
than 5 descendants which (1) are named "b" and (2) whose parent is named 
"c" . The logical formula denoting the set of children nodes named "a" is: 

ip = aA{<\* , A)t 
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The logical translation of the above XPath expression is: 

V'A(v)((v|[>)*)>5(6A^a;.(A)cv(<)x) 

This formula holds for nodes selected by the XPath expression. A correspon- 
dence between the main XPath axes over unranked trees and modal formulas 



over binary trees is given in Figure 13 In this figure, each logical formula holds 



for nodes selected by the corresponding XPath axis from a context 7. 



Path 


Logical formula 


7/self::* 


7 


7/child::* 


(<*,A)7 


7/parent::* 


(v)(>*)7 


7/descendant::* 


((< \^r,^h 


7/ancestor::* 


(V)((V|>)*)7 


7/following-sibling: :* 


{<){<ll 


7/preceding-sibling: : * 


(»(>*)7 



Figure 13: XPath axes as modalities over binary trees. 

Let consider another example (XPath expression ei): 

child : : a/ child : : b [count (child : : e/descendsLnt : : h] ) >3] 

Starting from a given context in a tree, this XPath expression navigates to 
children nodes named "a" and selects their children named "b". Finally, it 
retains only those "b" nodes for which the qualifier between brackets holds. 
The first path can be translated in the logic as follows: 

^ = b A fix.{A){a A ^x' .{a)j V (<| )x') V {a)x 

This example requires a more sophisticated translation in the logic. This is 
because it makes implicit that "e" nodes (whose existence is simply tested for 
counting purposes) must be children of selected "b" nodes. The translation of 
the full aforementioned XPath expression is as follows: 

^A@nA((A|<)%(v| >)*)>3r; 

where @n is a new fresh nominal used to mark a "b" node which is filtered by 
the qualifier and the formula 77 describes the counted "h" nodes: 

ri = Ha ^x.( A)(e a ^x' \^)@n v (< )x') v (< )a; v ( A)a; 

Intuitively, the general idea behind the translation is to first translate the leading 
path, use a fresh nominal for marking a node which is filtered, then find at least 
"3" instances of "h" nodes from which we can reach back the marked node via 
the inverse path of the counting formula. 
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Since trails make it possible to navigate but not to test properties (like 
existence of labels), we test for labels in the counted formula rj and we use a 
general navigation (A | <|)*,(v | >)* to look for counted nodes everywhere 
in the tree. Introducing the nominal is necessary to bind the context properly 
(without loss of information) . Indeed, the XPath expression ei makes implicit 
that a "e" node must be a child of a "b" node selected by the outer path. Using 
a nominal, we restore this property by connecting the counted nodes to the 
initial single context node. 

Lemma 4.1. The translation of Core XPath expressions with counting con- 
straints into the logic is linear. 

It is proven by structural induction in a similar manner to jGLSOTj (in which 
the translation is proven for expressions without counting constraints). For 
counting formulas, the use of nominals and the general (constant-size) counting 
trail make it possible to avoid duplication of trails so that the translation remains 
linear. 

Corollary 4.2. The equivalence problem for expressions of the form: 

PathExpr[count{PathExpr )#fc] 

where ^ e {<,>,=} and k is a constant, is decidable. More specifically, the 
equivalence problem can be decided in exponential time in terms of the expression 
size and the highest nesting level of counting formulas. 

4.2 Regular Tree Languages with Cardinality Constraints 

Regular tree grammars capture most of the schemas in use today [MLMKOS] . 
The logic can express all regular tree languages (it is easy to prove that regular 
expression types in the manner of e.g., [HVP05) can be linearly translated into 
the logic: see |GLS07j ). 

In practice, schema languages often provide shorthands for expressing car- 
dinality constraints on node occurrences. XML Schema notably offers two at- 
tributes minOccurs and maxOccurs for this purpose. For instance, the following 
XML schema definition: 

<xsd : element naine="a"> 

<xsd: complexType> 

<xsd : sequence> 

<xsd: element name="b" minOccurs="4" max0ccurs="9"/> 
</xsd : sequence> 
</xsd : complexType> 
</xsd: element> 

is a notation that restricts the number of occurrences of "b" nodes to be at 
least 4 and at most 9, as children of "a" nodes. The goal here is to have 
a succinct notation for expressing regular languages which could otherwise be 
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exponentially large if written with usual regular expression operators. The above 
regular requirement can be translated as the formula: 

<^A(V)((>*)>3&A(>*),96) 

where (j) corresponds to the regular tree type a[6*] as follows: 

(l>= (aA(^(v)T v(v)V')) a^([>)t 

■0 = /ix. (6a ^(v)t A ^(i>)t) V (6 A ^(v)t A (|>)a;) 

This example only involves counting over children nodes. The logic allows 
counting through more general trails, and in particular arbitrarily deep trails. 
Trails corresponding to the XPath axes "preceding, ancestor, following" can be 
used to constrain the context of a schema. The "descendant" trail can be used 
to specify additional constraints over the subtree defined by a given schema. 
For instance, suppose we want to forbid webpages containing nested anchors 
"o" (whose interpretation makes no sense for web browsers). We can build the 
logical formula / which is the conjunction of a considered schema for webpages 
(e.g. XHTML) with the formula a/descendant::a in XPath notation. Nested 
anchors are forbidden by the considered schema iff / is unsatisfiable. 

As another example, suppose we want paragraph nodes ("p" nodes) not to be 
nested inside more than 3 unordered lists ("u^' nodes), regardless of the schema 
defining the context. One may check for the unsatisfiability of the following 
formula: 

pA{{A\<i)\A)^^ul 

5 Related Work 

Counting over graphs The /i-calculus is a propositional modal logic aug- 
mented with least and greatest fixpoint operators |Koz82| . Kupferman, Sattler 
and Vardi study a ^-calculus with graded modalities where one can express, 
e.g., that a node has at least n successors satisfying a certain property |KSV02] . 
The modalities are limited in scope since they only count children of a given 
node. 

The /Lt-calculus has been recently extended with inverse modalities |Var98] . 
nominals |SV01| . and graded modalities jKSVQ2| . If only two of the above 
constructs are considered, satisfiability of the enriched calculus is EXPTIME- 
complete [BLMV06] . However, if all of the above constructs are considered 
simultaneously, the calculus becomes undecidable |BLMV06] . Hopefully, this 
undecidability result for the case of graphs does not preclude decidable tree 
logics combining such features. 

Counting over trees The notion of Presburger Automata for trees, combin- 
ing both regular constraints on the children of nodes and numerical constraints 
given by Presburger formulas, has independently been introduced by Dal Zilio 
and Lugiez |DZLM04| and Seidl et al. |SSMH04j . Specifically, Dal Zilio and 
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Lugiez |DZLM04] propose a modal logic for unordered trees called Sheaves logic. 
This logic allows to impose certain arithmetical constraints on children nodes 
but lacks recursion (i.e., fixpoint operators) and inverse navigation. Dal Zilio 
and Lugiez consider the satisfiability and the membership problems. Demri 
and Lugiez [DL06' showed by means of an automata-free decision procedure 
that this logic is only PSPACE-complete. Restrictions like pi nodes have no 
more "children" than p2 nodes, are expressible by this approach. Seidl et al. 
|SSMH04] introduce a fixpoint Presburger logic, which, in addition to numer- 
ical constraints on children nodes, also supports recursive forward navigation. 
For example, expressions like the descendants of pi nodes have no more "chil- 
dren" than the number of children of descendants of p2 nodes are allowed. This 
means that constraints can be imposed on sibling nodes (even if they are deep 
in the tree) by forward recursive navigation but not on distant nodes which are 
not siblings. 

Compared to the work presented here, neither of the two previous approaches 
can support constraints like there are more than 5 ancestors of "p" nodes. 

Furthermore, due to the lack of backward navigation, the works found in 
[nZLM04, SSMHOl IDL06J are not suited for succinctly capturing XPath ex- 
pressions. Indeed, it is well-known that expressions with backward modalities 
are exponentially more succinct than their forward-only counterparts |OMFB02l 
[GROH] . 

There is poor hope to push the decidability envelope much further for count- 
ing constraints. Indeed, it is known from fK R03[ IDLOS) ltCM09| that the equiv- 
alence problem is undecidable for XPath expressions with counting operators of 
the form: 

• PathExpr]^[count(PathExpr2) = count(PathExpr3)], or 

• PathExpr]^[position() = count(PathExpr2)]. 

This is the reason why logical frameworks that allow comparisons between count- 
ing operators limit counting by restricting the PathExpr to immediate children 
nodes |DZLM04l ISSMHd4] . In this paper, we chose a different tradeoff: compar- 
isons are restricted to constants but at the same time comparisons along more 
general paths are permitted. 

6 Conclusion 

We introduced a modal logic of trees equipped with (1) converse modalities, 
which allow to succinctly express forward and backward navigation, (2) a least 
fixpoint operator for recursion, and (3) cardinality constraint operators for ex- 
pressing numerical occurrence constraints on tree nodes satisfying some regular 
properties. A sound and complete algorithm is presented for testing satisfiabil- 
ity of logical formulas. This result is surprising since the corresponding logic for 
graphs is undecidable |BLMV06] . 

The decision procedure for the logic is exponential time w.r.t. to the formula 
size. The logic captures regular tree languages with cardinality restrictions, as 

RR n° 7251 



A Tree Logic with Graded Paths and Nominals 



32 



well as the navigational fragment of XPath equipped with counting features. 
Similarly to backward modalities, numerical constraints do not extend the log- 
ical expressivity beyond regular tree languages. Nevertheless they enhance the 
succinctness of the formalism as they provide useful shorthands for otherwise 
exponentially large formulas. 

This makes it possible to extend static analysis to a larger set of XPath and 
XML schema features in a more efficient way. We believe the field of application 
of this logic may go beyond the XML setting. For example, in verification of 
linked data structures [ZKR081 IHIV06J reasoning on tree structures with in- 
depth cardinality constraints seems a major issue. Our result may help building 
solvers that are attractive alternatives to those based on non- elementary logics 
such as SkS .TWeSj . like, e.g., Mona (KMOlj . 
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